MPMQA: Multimodal Question Answering on Product Manuals

نویسندگان

چکیده

Visual contents, such as illustrations and images, play a big role in product manual understanding. Existing Product Manual Question Answering (PMQA) datasets tend to ignore visual contents only retain textual parts. In this work, emphasize the importance of multimodal we propose Multimodal (MPMQA) task. For each question, MPMQA requires model not process but also provide answers. To support MPMQA, large-scale dataset PM209 is constructed with human annotations, which contains 209 manuals from 27 well-known consumer electronic brands. Human annotations include 6 types semantic regions for 22,021 pairs question answer. Especially, answer consists sentence related manuals. Taking into account length fact that always small number pages, can be naturally split two subtasks: retrieving most pages then generating We further unified perform these subtasks all together achieve comparable performance multiple task-specific models. The available at https://github.com/AIM3-RUC/MPMQA.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating Embedded Question Reuse in Question Answering

The investigation presented in this paper is a novel method in question answering (QA) that enables a QA system to gain performance through reuse of information in the answer to one question to answer another related question. Our analysis shows that a pair of question in a general open domain QA can have embedding relation through their mentions of noun phrase expressions. We present methods f...

متن کامل

Intelligent product manuals

Intelligent product manuals are designed to allow product users to utilize a product as easily, e€ectively and with as little additional care as possible while minimizing support costs for manufacturers and suppliers. It is ®rst shown how intelligent product manuals address these objectives by utilizing electronic, multimedia and knowledge-based technologies to provide active assistance to the ...

متن کامل

Question Answering on SQuAD

In this project, we exploit several deep learning architectures in Question Answering field, based on the newly released Stanford Question Answering dataset (SQuAD)[7]. We introduce a multi-stage process that encodes context paragraphs at different levels of granularity, uses co-attention mechanism to fuse representations of questions and context paragraphs, and finally decodes the co-attention...

متن کامل

Speech Grammars for Textual Entailment Patterns in Multimodal Question Answering

Over the last several years, speech-based question answering (QA) has become very popular in contrast to pure search engine based approaches on a desktop. Open-domain QA systems are now much more powerful and precise, and they can be used in speech applications. Speech-based question answering systems often rely on predefined grammars for speech understanding. In order to improve the coverage o...

متن کامل

Multimodal Question Answering over Structured Data with Ambiguous Entities

In recent years, we have witnessed profound changes in the way people satisfy their information needs. For instance, with the ubiquitous 24/7 availability of mobile devices, the number of search engine queries on mobile devices has reportedly overtaken that of queries on regular personal computers. In this paper, we consider the task of multimodal question answering over structured data, in whi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i11.26634